Reducing TPC-H Benchmarking Time
نویسندگان
چکیده
Benchmarking a system can be a time consuming operation. Therefore, many researchers have developed kernels and microbenchmarks. Nevertheless, these programs are not able to capture the details of a full application. One such example are the complex database applications. In this work we present a methodology based on a statistical method, Principal Component Analysis, in order to reduce the execution time of TPC-H, a decision support benchmark. This technique selects a subset of queries from the original set that are relevant and may be used to evaluate the systems. We use the subsets to determine the ranking of different computer systems. Our experiments show that with a small subset of 5 queries we are able to rank different systems with more than 80% accuracy in comparison with the original order and this result is achieved with as little as 20% of the original benchmark execution time.
منابع مشابه
Architecture and Performance Characteristics of a PostgreSQL Implementation of the TPC-E and TPC-V Workloads
The TPC has been developing a publicly available, end-to-end benchmarking kit to run the new TPC-V benchmark, with the goal of measuring the performance of databases subjected to the variability and elasticity of load demands that are common in cloud environments. This kit is being developed completely from scratch in Java and C++ with PostgreSQL as the target database. Since the TPC-V workload...
متن کاملBenchmarking Hybrid OLTP&OLAP Database Systems
Recently, the case has been made for operational or real-time Business Intelligence (BI). As the traditional separation into OLTP database and OLAP data warehouse obviously incurs severe latency disadvantages for operational BI, hybrid OLTP&OLAP database systems are being developed. The advent of the first generation of such hybrid OLTP&OLAP database systems requires means to characterize their...
متن کاملPig vs Hive: Benchmarking High Level Query Languages
This article presents benchmarking results of two benchmarking sets (run on small clusters of 6 and 9 nodes) applied to Hive and Pig running on Hadoop 0.14.1. The first set of results were obtainted by replicating the Apache Pig benchmark published by the Apache Foundation on 11/07/07 (which served as a baseline to compare major Pig Latin releases). The second results were obtained by applying ...
متن کاملBenchmarking attribute cardinality maps for database systems using the TPC-D specifications
Benchmarking is an important phase in developing any new software technique because it helps to validate the underlying theory in the specific problem domain. But benchmarking of new software strategies is a very complex problem, because it is difficult (if not impossible) to test, validate and verify the results of the various schemes in completely different settings. This is even more true in...
متن کاملBenchmarking Using Basic DBMS Operations
The TPC-H benchmark proved to be successful in the decision support area. Many commercial database vendors and their related hardware vendors used these benchmarks to show the superiority and competitive edge of their products. However, over time, the TPC-H became less representative of industry trends as vendors keep tuning their database to this benchmark-specific workload. In this paper, we ...
متن کامل